Feature-Rich Information Extraction for the Technical Trend-Map Creation
نویسندگان
چکیده
The authors used a word sequence labeling method for technical effects and base-technology extraction in the Technical Trend Map Creation Subtask of the NTCIR-8 Patent Mining Task. The method labels each word based on CRF (Conditional Random Field) trained with labeled data. The word features employed in the labeling are obtained by using explicit/implicit document structures, technology fields assigned to the document, effect context phrases, phrase dependency structures and a domain adaptation technique. Results of the formal run showed that the explicit document structure feature and the phrase dependency structure feature are effective in annotating patent data. The implicit document structure feature and the domain adaptation feature are also effective for annotating paper data.
منابع مشابه
An Information Extraction Method for Multiple Data Sources
We developed a method of information extraction for multiple data sources or for various kinds of datasets like Internet web pages. Generally, because many different writing styles or vocabularies exist among different kinds of data, the accuracy of information extraction using various kinds of datasets is not better than that using a single kind of data. Our method divides the data by clusteri...
متن کاملAn Integrated Ontology Development Environment for Data Extraction
Data extraction is a necessary technology to deal with the huge and growing collection of unstructured and semistructured information available on the World Wide Web. Ontology-based data extraction is a robust approach, but the construction of ontologies is a technical task requiring the services of a human expert. We present a Java-based tool for the graphical creation and testing of data extr...
متن کاملExperiments for NTCIR-8 Technical Trend Map Creation Subtask at Hitachi
This paper reports on an experiment to evaluate the extraction of effect expressions from patents and papers (in Japanese) at the subtask of Technical Trend Map Creation in NTCIR-8 Patent Mining Task. To obtain a more detailed structure for the expressions, we defined that effect expressions consist of TARGET, SCALE and IMPACT elements. We created training data based on these elements and assig...
متن کاملDeveloping a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature
According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010